Skip to content

Conversation

@SmritiAgrawal04
Copy link

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @SmritiAgrawal04

This PR needs some tests to show it working I think

Copy link
Contributor

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do somewhat agree with @alamb. I know it's close to impossible to write an integration test for this, but maybe we can at least have a unit test for parse_url?

@tustvold tustvold marked this pull request as draft December 13, 2025 13:22
let first_label = host.split('.').next().unwrap_or_default();
self.account_name = Some(validate(first_label)?);

let container = parsed.path_segments().unwrap().next().expect(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: onelake uses workspace terminology

// Regex to match WS-PL FQDN: "{workspaceid}.z??.dfs.fabric.microsoft.com"
// workspaceid = 32 hex chars, z?? = z + first two chars of workspaceid
lazy_static::lazy_static! {
static ref WS_PL_REGEX: Regex = Regex::new(r"^(?P<workspaceid>[0-9a-f]{32})\.z(?P<xy>[0-9a-f]{2})\.(dfs|blob)\.fabric\.microsoft\.com$").unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add support for .onelake.fabric.microsoft.com also

let xy = captures.name("xy").unwrap().as_str();

// Validate z?? matches first 2 chars of workspaceid
if &workspaceid[0..2] != xy {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this validation

}

// Otherwise, check Fabric global / Onelake API FQDN
if host.ends_with(DFS_FABRIC_SUFFIX) || host.ends_with(BLOB_FABRIC_SUFFIX) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are checking for global endpoint, should these not be with not like (!host.ends_with(DFS_FABRIC_SUFFIX) && !host.ends_with(BLOB_FABRIC_SUFFIX))

if host.ends_with(DFS_FABRIC_SUFFIX) || host.ends_with(BLOB_FABRIC_SUFFIX) {
let labels: Vec<&str> = host.split('.').collect();
let account_name = if labels.len() >= 2 && labels[0].contains("api") && labels[1] == "onelake" {
format!("{}-{}", labels[0], labels[1])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are referring workspace id as account_name, then this will not work for non pl scenario. In those case it will give account_name as "westus-api-onelake"

@SmritiAgrawal04 SmritiAgrawal04 marked this pull request as ready for review January 9, 2026 05:03
@SmritiAgrawal04
Copy link
Author

Hi @alamb & @crepererum,

I added the unit tests as suggested. I request to review the PR please.

let mut builder = MicrosoftAzureBuilder::new();
builder
.parse_url("https://account.blob.fabric.microsoft.com/container")
.parse_url("https://account.blob.fabric.microsoft.com/")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this test case change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By mistake. Reverted.

}
Some((a, "dfs.fabric.microsoft.com")) | Some((a, "blob.fabric.microsoft.com")) => {
self.account_name = Some(validate(a)?);
// Attempt to infer the container name from the URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this comment? It seems helpful

}
"https" => {
// Regex to match WS-PL FQDN:
// "{workspaceid}.z??.(onelake|dfs|blob).fabric.microsoft.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add an example URL for each of the APIs you are adding support for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for WS-PL DFS/ Blob endpoints. We are waiting for PM to confirm on ABFSS & WS-PL onelake domains.

self.container_name = Some(validate(parsed.username())?);
self.account_name = Some(validate(a)?);
self.use_fabric_endpoint = true.into();
} else if let Some(a) = host.strip_suffix("-api.onelake.fabric.microsoft.com") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is *-api.onelake.fabric.microsoft.com a publicly documented endpoint? If yes, can you point to the Microsoft doc so we can cite it in code/tests?

I don't see it in https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SmritiAgrawal04 -- i left some more questions

BTW I asked codex for a review of this PR like this

please review this PR (checked out in this directory): https://github.com/apache/arrow-rs-object-store/pull/552

  Specifically ensure that the format of the URLs supported matches publically published microsoft urls and provide documentation references

And this was the output which I found helpful

───────────────────────────────────────────────────────────────────────────────────────────────

• Findings

  • High: -api.onelake.fabric.microsoft.com is accepted for both abfs(s) and https, but I can’t find this FQDN in
    the public OneLake URL docs. The published endpoints are onelake.dfs.fabric.microsoft.com and workspace-scoped
    https://{workspaceId}.z{xy}.dfs.fabric.microsoft.com/... (see https://learn.microsoft.com/en-us/fabric/onelake/
    onelake-access-api). If -api.onelake is internal or not publicly documented, this should be gated or removed.
  • Medium: The WS‑PL regex no longer verifies that z{xy} matches the first two hex chars of the workspace id (src/
    azure/builder.rs:692-708). The published WS‑PL format is https://
    {workspaceId}.z{first2}.dfs.fabric.microsoft.com (same doc). Without the check, non‑published FQDNs are
    accepted.
  • Medium: WS‑PL regex allows .onelake.fabric.microsoft.com (src/azure/builder.rs:692-699). The public docs list
    dfs (and blob endpoints for blob access), but I don’t see onelake as a WS‑PL subdomain. Please confirm with a
    Microsoft reference or remove to match published formats.
  • Low: WS‑PL regex only allows lowercase hex ([0-9a-f]{32}) (src/azure/builder.rs:697). If Microsoft treats
    workspace IDs as case‑insensitive GUIDs in URLs, this will reject valid uppercase forms; consider documenting
    that only lowercase is supported or make the regex case‑insensitive.

Doc references

  • OneLake access API URL formats: https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api
  • Private Link / workspace-specific OneLake endpoints (if this PR targets WS‑PL): please confirm the exact doc
    section that defines the workspaceId.z{xy}.dfs.fabric.microsoft.com pattern and whether any onelake WS‑PL host
    is documented (I could not find it in public docs).

Questions / assumptions

  • Is *-api.onelake.fabric.microsoft.com a publicly documented endpoint? If yes, can you point to the Microsoft
    doc so we can cite it in code/tests?
  • Should WS‑PL accept only dfs/blob subdomains, or is .onelake.fabric.microsoft.com explicitly published?

@tustvold
Copy link
Contributor

FYI #604 may be related.

@SmritiAgrawal04
Copy link
Author

Thanks @SmritiAgrawal04 -- i left some more questions

BTW I asked codex for a review of this PR like this

please review this PR (checked out in this directory): https://github.com/apache/arrow-rs-object-store/pull/552

  Specifically ensure that the format of the URLs supported matches publically published microsoft urls and provide documentation references

And this was the output which I found helpful

───────────────────────────────────────────────────────────────────────────────────────────────

• Findings

  • High: -api.onelake.fabric.microsoft.com is accepted for both abfs(s) and https, but I can’t find this FQDN in
    the public OneLake URL docs. The published endpoints are onelake.dfs.fabric.microsoft.com and workspace-scoped
    https://{workspaceId}.z{xy}.dfs.fabric.microsoft.com/... (see https://learn.microsoft.com/en-us/fabric/onelake/
    onelake-access-api). If -api.onelake is internal or not publicly documented, this should be gated or removed.
  • Medium: The WS‑PL regex no longer verifies that z{xy} matches the first two hex chars of the workspace id (src/
    azure/builder.rs:692-708). The published WS‑PL format is https://
    {workspaceId}.z{first2}.dfs.fabric.microsoft.com (same doc). Without the check, non‑published FQDNs are
    accepted.
  • Medium: WS‑PL regex allows .onelake.fabric.microsoft.com (src/azure/builder.rs:692-699). The public docs list
    dfs (and blob endpoints for blob access), but I don’t see onelake as a WS‑PL subdomain. Please confirm with a
    Microsoft reference or remove to match published formats.
  • Low: WS‑PL regex only allows lowercase hex ([0-9a-f]{32}) (src/azure/builder.rs:697). If Microsoft treats
    workspace IDs as case‑insensitive GUIDs in URLs, this will reject valid uppercase forms; consider documenting
    that only lowercase is supported or make the regex case‑insensitive.

Doc references

  • OneLake access API URL formats: https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api
  • Private Link / workspace-specific OneLake endpoints (if this PR targets WS‑PL): please confirm the exact doc
    section that defines the workspaceId.z{xy}.dfs.fabric.microsoft.com pattern and whether any onelake WS‑PL host
    is documented (I could not find it in public docs).

Questions / assumptions

  • Is *-api.onelake.fabric.microsoft.com a publicly documented endpoint? If yes, can you point to the Microsoft
    doc so we can cite it in code/tests?
  • Should WS‑PL accept only dfs/blob subdomains, or is .onelake.fabric.microsoft.com explicitly published?

Hi @alamb,

I have addressed all comments. About, finding #1 & #3, we plan to add it to the public documentation, the PR for which is already out. I request to approve these changes meanwhile. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants